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1.   Introduction 

The  literature  on  and  practice  of  simultaneous  equations  estimation  may 
be  divided  into  three  (not  always  entirely  distinct)  parts.  The  first  of  these 
concerns  the  pure  theory  of  such  estimation  itself.  A  simultaneous  equation 
system  being  specified,  certain  estimators  are  proposed  for  dealing  with  it; 
the  properties  of  those  estimators  are  investigated,  and  so  forth.  This  line  of 
development  includes  all  of  the  standard  literature  on  simultaneous  equations 
estimation;  it  is  a  substantial  branch  of  multivariate  statistics  of  special  in- 
terest to  economists,  and  it  continues  to  grow.  As  it  has  not  grown  very  quickly 

in  the  last  few  years,  however,  and  as  I  surveyed  it  in  some  detail  on  a  previous 

1 
occasion,   I  shall  perhaps  be  forgiven  if  my  discussion  of  this  area  largely 

repeats  familiar  material.   I  shall  try  to  be  relatively  brief  in  order  to  con- 
centrate on  the  other  two  parts  of  the  subject. 

The  second  main  area  to  be  discussed  is  the  applied  theory  of  simultaneous 
equations  estimation.   This  largely  consists  of  the  technical  knowhow  that  is 
accumulated  by  practicing  econometricians  in  the  course  of  their  business.   In 
part,  this  really  is  the  art  as  opposed  to  the  science  of  simultaneous  equations 
estimation,  and  it  is  much  more  difficult  than  the  theory  to  convey  in  any  sys- 
tematic fashion,  if,  indeed,  much  of  it  can  be  conveyed  at  all.   I  have  attempted 
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in  part  to  do  so  by  discussing  a  list  of  problems  which  arise  with  some  frequency 
and  which  give  rise  to  errors  which  experience  suggests  it  is  all  too  easy  for 
the  unwary  to  fall  into.   Here,  1  trust  I  shall  be  forgiven  if  my  tone  becomes 
pontifically  avuncular. 

The  third  area  is  that  of  what  might  be  called  "metatheory ."   Simultaneous 
equations  estimation  has  been  the  special  province  of  econometricians ,  and  they 
have  tended  to  worry  a  good  deal  over  what  it  is  really  about  in  terms  of  the 
causal  nature  of  economic  models.   Such  discussions  at  times  have  bordered  on  the 
arid,  but  have  also  contributed  insights  into  the  pure  and  applied  theory  of 
estimation  in  economic  models. 

Naturally,  these  three  areas  are  not  really  separate  ones,  and  it  would  be 
pointless  to  try  to  draw  hard  lines  between  them.   As  just  suggested,  metatheo- 
retic  methodological  discussions  have  contributed  to  theory;  further,  it  is 
difficult  to  say  where  pure  theory  leaves  off  and  applied  theory  begins.   Such 
subjects  as  the  effects  of  specification  error  and  the  ways  of  dealing  with  non- 
linearities  tend  to  spread  across  the  border  between  theory  and  practice. 
Nevertheless,  the  tripartite  division  just  described  seems  to  me  a  convenient  one 
for  survey  purposes,  and  I  shall  generally  keep  to  it  where  possible. 

2 .   Pure  Theory 

Suppose  that  the  equation  to  be  estimated  is: 

(2.1)  q  =  Y^  +  ZjY  +  Uj 

where  q   is   a  T-component   column  vector  of  observations   on   an  endogenous  variable; 
Vj   is  a     T  x  m     matrix  of  observations   on     m     additional  endogenous  variables; 
Zj   is   a     T   x  I  matrix  of  observations  on   I  predetermined  variables;    6   and  y   are 
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vectors   of  parameters   to  be  estimated;    and   u     is   a  T-component  vector  of  values 
of  a  random  disturbance.    (I  shall   refer   to  variables   and  their  observation  vectors 

or  matrices  by  the  same  name.)      Until   further  notice,   u,      is   assumed   to  have  zero 

2 
mean,    to  have   constant  variance,   a      ,   and  not   to  be  serially   correlated. 

u. 

Equation  (2.1)  is  the  first  equation  in  a  set  of  simultaneous  equations,  the 

full  model  being: 

(2.2)  YB  +  Zr  =  U  , 

where  the  notation  is  obvious.   There  are  M  endogenous  variables,  Y,  and  A  prede- 
termined variables,  Z.   The  defining  characteristic  of  the  predetermined  variables 

is  that  the  disturbance  terms  (among  which  is  Uj)  are  assumed  to  be  uncorrelated 

2 
with  them  in  the  probability  limit.    The  reduced  form  of  the  model,  obtained  by 

solving  for  Y,  is 

(2.3)  y  =  zn  +  V  , 
where 

(2.4)  n  =  -r  B_1  ;         V  =  uB"1  . 


It  is  useful  to  think  of  estimators  of  (2.1)  as  falling  into  three  main 
classes.   The  first  of  these,  ordinary  least  squares (OLS)  and  some  generalizations, 
estimate  (2.1)  without  regard  for  the  fact  that  it  is  embedded  in  the  simultaneous 
system,  (2.2).   As  is  well  known,  such  estimators  are  inconsistent  (if  nothing 
special  is  said  about  (2.2)).   Their  consistency  would  require  that  the  variables 
In  Yj   be  uncorrelated  with  Uj  in  the  probability  limit;  this  is  readily  seen  by 


2 

Rigor  would  require  some  additional  assumptions  to  ensure,  for  example, 

that  the  probability  limits  of  various  moment  matrices  exist.   I  shall  not  burden 

the  discussion  with  such  matters. 
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(2.3)  to  be  false.  OLS  applied  to  the  reduced  form  (2.3)  itself  is  another  matter. 
Under  the  assumptions  made,  it  is  consistent;  under  somewhat  stronger  assumptions, 
it  is  unbiased.   It  does  not  use  any  of  the  prior  restrictions  (usually  knowledge 
as  to  which  variables  appear  in  which  structural  equations),  and  is  not  generally 
efficient,  at  least  asymptotically. 

The  second  principal  class  of  estimators  does  take  account  of  the  fact  that 
the  equation  to  be  estimated  is  part  of  a  simultaneous  system,  but  uses  prior  re- 
strictions only  on  the  equation  to  be  estimated,  ignoring  what  may  be  known  about 
the  rest  of  the  system  (except  for  the  list  of  endogenous  and  predetermined  vari- 
ables) .   I  shall  call  such  estimators  "limited  information l!  estimators.   There 
are  many  such,  the  two  most  important  being  limited  information,  maximum  likelihood 
(LIML)  and  two-stage  least-squares  (2SLS)  .  Most  of  the  simultaneous  equation  esti- 
mation done  in  practice  uses  some  variant  of  these  estimators. 

The  third  principal  class  of  estimators  uses  the  fact  that  the  equation  to 
be  estimated  is  part  of  a  simultaneous  system,  and  also  uses  all  the  prior  restric- 
tions on  that  system  in  estimating  any  equation  thereof.    Indeed,  while  the 
limited  information  estimators  proceed  equation-by-equation,  using  information  on 
only  one  equation  at  a  time,  the  estimators  which  I  shall  call  "full  information" 
estimate  the  entire  system  at  once,  using  all  the  information  available.   Parallel 
to  the  case  of  the  limited  information  estimators,  the  leading  members  of  the  full 
information  class  are  full  information,  maximum  likelihood  (FIML)  and  three-stage 
least  squares  (3SLS)  . 

With  this  classification  in  mind,  I  shall  proceed  in  what  may  seem  an  unnat- 
ural order  and  discuss  the  full  information  estimators  first. 
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2 .1  Full  Information  Estimators 

Full  information,  maximum  likelihood  proceeds  in  the  way  which  seems  most 
natural  to  mathematical  statisticians;  it  maximizes  the  likelihood  function  subject 
to  all  the  constraints  provided  by  all  the  prior  information.  Unfortunately,  this 
straightforward  procedure  is  extraordinarily  complex  computationally   (primarily 
because  of  the  appearance  of  B   in  (2.3)),  and  FIML  estimates  were  for  many  years 

simply  out  of  the  question.   With  improvements  in  computer  technology  and  in  the 

3 
inventive  use  of  software,  that  situation  has  in  part  been  rectified,   although 

it  is  still  the  case  that  computation  of  FIML  estimates  for  large  models  is  out  of 

the  question. 

The  computational  difficulties  of  FIML,   however,    are  not  dispositive  of  the 
question  of   full-information  estimators,    since  it   is  now  known       that    three-stage 
least  squares    (among  other  estimators)   has   the  same   asymptotic  distribution  as 
FIML.      Since   3SLS  does  not  present   anything  like   the  same   computational  difficulties 
and  since  everything  we  know  about  FIML  is  asymptotic  anyway,    this  makes   3SLS   the 
leading  contender  in   the   full-information   class. 

Three-stage   least   squares   can  be   roughly   described   as   a  clever  application 
of  Aitken's   generalized  least   squares   to   a  system  that  has   been   first  purified  of 
simultaneous   equation  difficulties  by  2SLS,    and   then   rewritten  as  one  big  regression 

equation  along  the   lines  used  by   Zellner   [37]    in  dealing  with  several  seemingly 

5 
unrelated   regression  equations.  It   uses   the  prior  restrictions   equation-by- 


3 

See   Eisenpress    [8]. 

«♦ 

See  Madansky  [23],  Rothenberg  and  Leenders  [29],  and  Sargan  [31]. 

5 

Three-stage    least   squares  was   developed  by  Zellner   and   Theil    [38].      Details 

may  be    found  in   any   standard    text. 


equation  in  the  2SLS  part,  and  then  combines  them  by  reestimating  the  entire  system 
as  one  equation  and  taking  advantage  of  the  fact  that  the  disturbances  from  the 
different  equations  will  (generally)  be  correlated.  As  its  name  suggests,  it  is  in 
some  sense  a  relatively  full  application  of  the  least  squares  principle  of  estima- 
tion to  this  class  of  problems,  just  as  FIML  is  a  full  application  of  the  maximum 
likelihood  principle. 

The  question  might  well  be  asked  as  to  why  the  discussion  of  different  estima- 
tors does  not  end  here?  Having  the  same  asymptotic  distribution  as  FIML,  3SLS  is 
asymptotically  efficient.   Since  we  still  know  very  little  about  non -asymptotic 

properties  of  any  of  these  estimators,  why  doesn't  3SLS  dominate  all  other  estima- 

6 
tors  when  it  is  available? 

In  a  happier  world,  where  all  assumptions  were  true  and  all  models  the  way  in 

which  theory  supposes  them  to  be,  this  would  indeed  be  the  case.   Unfortunately, 

we  do  not  live  in  such  a  world.   Apart  from  the  fact  that  full  information  estimators 

tend  to  be  more  sensitive  to  multicollinearity  than  the  limited  information  estima- 

7 
tors,   it  also  is  more  sensitive  than  they  to  specification  error  in  a  particularly 

unpleasant  way. 

Just  because  full  information  estimators  use  all  the  prior  information  in  the 

system  to  estimate  all  the  parameters  at  once,  they  carry  the  effects  of  a  mistake 

in  that  information  into  all  the  estimates.   The  very  property  which  makes  them 

asymptotically  efficient  also  makes  them  peculiarly  vulnerable.  This  means  that  a 

specification  error  in  a  particular  equation  will  not  only  affect  the  estimates  of 

the  parameters  of  that  equation,  but  will  also  affect  the  estimates  of  every  para- 


6 

It  is  not  available  when  the  sample  size  is  too  small  to  permit  the  computa- 
tion of  the  OLS  estimate  of  the  reduced  form  equations  with  which  3SLS  begins.   I 
shall  return  to  such  cases  below.   It  is  also  computationally  difficult  for  very 
large  models  because  of  the  order  of  the  matrices  involved. 

7 

See  Klein  and  Nakamura  [  20  ] • 
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meter  of   the   model.      Since   one   is   ordinarily  not    certain   as    to   specification   and, 
perhaps   more   importantly,    since   one   is   ordinarily  more    certain   about    the   specifica- 
tion of  some   equations    than   of  others,    it   is   very  undesirable    to  have  poor  specifi- 
cation of  one   nart  of   the   model    affect    the   results   in   the   other  areas.      This   is 
particularly  so  if  some  equations   are  of  more   interest   than  others,   which   appear 

largely   to   close    the  model.      Since    limited-information  estimators,    as    one  would  ex- 

f—  -)--'  H  8 

pect,    do (quarantine    the  effects   of  specification  error,      they   are    greatlv   to   be   pre- 
ferred  to    full-information   estimators   on   this   very   important    count. 

2.2      Limited-Information  Estimators 


As   already  stated,    the   two   leading  estimators   in   the   limited-information   class 
are  LIML   and   2SLS,    although   there   are   others    (all    the  other  members  of  Theil's 
k-class   and   some  other  estimators   as  well)  . 

LIML  proceeds  by  maximizing  the   likelihood   function   subject  only   to   the    con- 
straints  imposed  by   the.   prior  information  on    the   equation   being  estimated.        2SLS, 
on    the  other  hand,    proceeds   by   replacing  Y      in    (2.1)   with  Y    ,    the    latter  being  the 
systematic  part   of    the   regression   of  the  variables   in  Y.    on    all   the   predetermined 
variables    (the  OLS  estimate   of  part   of    the    reduced    form),    and   then   regressing  q   on 
Y l    and    Z 1 . 

Both  of    these   estimators    are  known    to   have    the   same   asymptotic  distribution, 
so  on   asymptotic    grounds    there   is  nothing   to   choose  between   them.        Unfortunately, 
small   sample  properties   are  very  hard    to  establish.      Such   things   as   are  known  are 
almost   all  properties   of   2SLS   in  relatively  simple   small  models,    and   do  not   provide 
much  help   in   choosing  between   the   estimators. 

Some    guidance   on    that    choice   is   provided  by  Nagar    [26],   who    found   that   k- 
class   estimator   for  which   the  bias   approaches    zero    faster   than    (1/T) ,   where  T      is 
the   sample   size.      Nagar's   estimator  has    this   property    (as   one  would   expect,    con- 


8 

See   Fisher   [10] 


-8- 


sidering  the  difficulties   in   the  single-equation   case)      only   for  models   in  which 
there   are  no   lagged  endogenous   variables.      For  simultaneous  equation  models,    this 
is   a  very   restricted  case,    since  most   models  do   contain  dynamic   features.      In   any 
case,    lack  of  bias   alone  is  not   a  very  important   desirable  property  of  an  estimator. 

More   recently,   some   light  has  been   thrown  on    these   issues   by  a  new  approach 
due   to  Kadane    [19    ],   who   considers  what  happens  when   the  error   term  in   the  equation 
to  be  estimated  has  very  small  variance.      He   argues   that   such    "small-o"   asymptotics 
are  just   as  meaningful  as   the  usual  "large-T"   asymptotics   for  making  inferences 
about   the  properties  of  estimators   in  actual  situations.        It   turns   out   in   the   case 
of  small  a,   unlike   the  case  of  large  T,    that  LIML  and  2SLS  do  behave  differently, 
the   difference    turning  on   the   degree   to  which   the  equation  to  be  estimated  is   over- 
identified.        If   the  degree   of  overidentifi cation  is   relatively  small    (the  number 
of  variables  omitted   from   (2.1)   between  M-l   and  M+5),    then  2SLS   is  superior   to  LIML 
for  very  small  o,    the   roles  being  reversed  if  the   degree  of  overidentification  is 
high.     While   these   results  seem  promising,    I   think  it   is   fair  to  say   that    their  im- 
plications  remain  to  be   fully  understood. 

In   almost   all  other  respects,   significant  differences  between  LIML  and   2SLS 
have   failed   to  emerge.      We  do  know  that,   just   as    full-information  estimators   tend   to 

be  more   sensitive   to  multicollinearity   than   are   the   limited-information  estimators, 

9 
so  LIML   tends   to  be   more  sensitive    than  is   2SLS .        On   the  other  hand,    the  effects  of 

specification  error  which  weighed  heavily  against   full-information   estimators   does 
not   really  distinguish   the   two  limited-information  estimators   from  each  other.      We 
do  know  that  all   limited-information  estimators   quarantine   the   effects   of  specifica- 
tion error  in   a  particular  equation;   we   also  know   that,    for  sufficiently  small 


9 

Klein  and  Nakamura   [20   ] 


-9- 


specification  error,  the  inconsistencies  introduced  in  LIML  and  2SLS  become  negli- 
gible. Interestingly  enough,  it  turns  out  despite  this  that  small  specification 
errors  generally  have  different  effects  on  the  probability  limits  (or  the  asympto- 
tic root  mean  square  errors)  of  LIML  and  2SLS,  but  the  hope  that  this  may  serve  to 
provide  a  general  criterion  on  which  to  choose  between  them  has  been  shown  to  be 

forlorn.  Which  of  them  is  more  robust  to  specification  error  cannot  be  determined 

11 
without  some  statement  as  to  what  the  nature  and  direction  of  that  error  might  be. 

The  remaining  area  in  which  attempts  have  been  made  to  distinguish  between  2SLS 
and  LIML  is  of  quite  another  sort.   It  has  been  argued  that  the  fact  that  LIML 
treats  all  endogenous  variables  symmetrically,  while  2SLS  requires  (and  is  not  in- 
variant to)  the  choice  of  a  normalization  rule,  weighs  in  favor  of  the  former  esti- 

12 
mator  as  more  natural.     On  the  other  hand,  it  can  also  be  argued  that  natural  nor- 

13 
malization  rules  do  exist,  which  would  make  2SLS  the  more  natural  estimator.    This 

discussion  essentially  belongs  to  the  realm  of  metatheory,  and  I  shall  return  to  it 
in  the  section  devoted  to  that  subject.   For  the  present,  suffice  it  to  say  that  no 
matter  which  side  of  the  normalization  question  one  agrees  with,  the  "naturalness" 
of  the  resulting  estimator  carries  no  weight  compared  to  that  of  a  difference  in  the 
actual  properties  thereof.   We  simply  do  not  know  whether  the  imposition  of  a  nor- 
malization rule  where  none  exists  or  the  failure  to  impose  one  which  does  exist  has 
any  systematic  effect  on  the  small  sample  properties  of  the  resulting  estimator  (it 
has  no  asymntotic  effect).   Lacking  such  knowledge,  the  discussion  of  the  normaliza- 
tion question  becomes  (as  are  all  good  discussions)  somewhat  academic. 


10 

Fisher  [10]. 

11 

See  Fisher  [13]  and  [14]. 

12 

See  Chow  [5]. 

13 

See  Fisher  [11] . 
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3.  Applied  Theory 

This  section  is  less  systematically  organized  than  the  preceding  one.   It 
consists  of  a  discussion  of  a  number  of  topics  which  are  of  more  or  less  practical 
importance.   Some  of  these  are  not  connected  to  the  others,  and  some  of  them  could 
have  been  introduced  in  the  preceding  section.   To  an  extent,  the  present  section 
discusses  errors  that  are  sometimes  made  in  practice. 

3. 1  Recursive  Systems 

This  topic  could  certainly  have  been  discussed  in  the  pure  theory  section, 
and  its  genesis  certainly  belongs  to  the  realm  of  metatheory,  where  we  shall  en- 
counter it  again.   I  have  chosen  to  discuss  it  here  because  there  is  often  some 
deviation  between  the  theory  and  its  application. 

Briefly,  the  theory  of  recursive  systems  runs  as  follows.   If  the  matrix  of 
coefficients  of  current  endogenous  variables  (B  in  equation  (2.2))  is  triangular 
and  the  covariance  matrix  of  current  disturbances  is  diagonal,  then  the  equation 
system  is  said  to  be  recursive.   In  that  case,  it  is  not  hard  to  show  that,  despite 
the  simultaneous  appearance  of  the  system,  ordinary  least  squares  is  in  fact  a 
consistent  estimator;  as  a  matter  of  fact,  it  is  the  full-information,  maximum 

likelihood  estimator. 

14 
This  is  a  beautiful  theorem   whose  practical  importance  is,  alas,  far  less 

than  is  generally  supposed.   Triangular  coefficient  matrices  are  moderately  fre- 
quent —  particularly  when  practicing  econometricians  see  the  home  country  of  OLS 
ahead  of  them  —  but  it  is  fairly  generally  forgotten  that  the  theorem  has  another 
condition  aside  from  this,  namely,  the  diagonality  of  the  covariance  matrix  of 
the  disturbance  terms.   That  diagonality  is  required  for  the  applicability  of  OLS. 


I1* 

Due  to  Wold  f  35] 
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Indeed,  if  no  further  restrictions  are  placed  on  the  equation  system,  such  diagon- 

15 
ality  is  necessary  for  the  very  identifiability  of  all  but  one  of  the  equations. 

No  consistent  estimator  exists  without  it  in  that  case. 

The  trouble  is,  of  course,  that  information  allowing  us  to  believe  that  dis- 
turbances from  different  equations  are  uncorrelated  is  extremely  hard  to  come  by. 
Economic  theory  can  tell  us  that  the  coefficient  matrix  is  triangular;  it  generally 
tells  us  nothing  whatsoever  about  the  relations  between  disturbances.   This  being 
so,  it  is  best  to  be  as  general  as  conveniently  possible  and  not  to  rest  the  valid- 
ity of  ones  entire  estimation  procedure  on  very  specific  assumptions  about  the  dis- 
turbances. While  it  is  true,  as  we  shall  see  in  the  discussion  of  metatheory, 
that  the  concept  of  a  predetermined  variable  rests  in  part  on  assumptions  that  dis- 
turbances applying  to  equations  in  widely  different  parts  of  the  socio-economic 
system  are  mutually  uncorrelated,  such  an  assumption  can  be  swallowed  considerably 
more  easily  in  the  case  of  such  widely  separated  equations  than  in  the  case  of 
closely  connected  ones.   Shocks  which  disturb  one  side  of  a  particular  market  are 
rather  likely  also  to  disturb  another.   Good  practice  requires  at  least  a  facing 
up  to  the  problem  and  an  affirmative  argument  that  the  correlation  between  distur- 
bances is  not  likely  to  be  great,  if  such  an  argument  can  be  given. 

And  such  an  argument  will  have  to  be  a  prior  one.   It  is  not  in  general 
possible  to  test  for  the  crucial  zero  correlations;  indeed,  as  with  all  identify- 
ing assumptions,  it  is  less  possible  the  more  crucial  it  is.   In  the  purest  case 
in  which  nothing  further  is  known  about  the  system  beyond  the  triangularity  of  B, 
it  is  easy  to  show  that  computation  of  the  least  squares  residuals  and  testing  of 
the  crucial  disturbance  correlations  by  use  of  the  correlations  between  those 
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residuals  will  invariably  lead  to  acceptance  of  the  hypothesis  of  zero  correlation 
(provided  there  is  enough  rounding  error  in  the  program  to  keep  one  from  being 

suspicious  of  the  fact  that  all  the  test  statistics  are  identically  zero) .   Even 

16 
where  more  restrictions  are  placed  on  B  and  r,  such  a  test  is  hopelessly  biased. 

A  proper  test  can  be  made  only  if  the  diagonality  of  the  covariance  matrix  is 

not  required  for  identification  and,  in  order  to  make  such  a  test,  one  would  have 

to  compute  consistent  parameter  estimates  with  a  simultaneous  equation  technique 

anyway. 

3.2   Lagged  Endogenous  Variables  and  Autocorrelated  Disturbances 

A  closely  related  question  concerns  the  use  of  lagged  endogenous  variables 
as  predetermined.   I  observed  above  that  triangular  coefficient  matrices  are 
fairly  common.   This  occurs  in  part  because  econometricians ,  uncertain  as  to  lag 
structures  and  believing  that  least  squares  lies  ahead  of  them,  occasionally  de- 
cide in  favor  of  a  short  time  lag  rather  than  full  simultaneity.   Such  a  decision 
(to  lag  an  endogenous  variable)  makes  an  element  of  B  zero  and  places  the  lagged 
endogenous  variable  in  the  list  of  predetermined  ones.  This  is  particularly 
easy  to  do  if  the  time  interval  of  the  model  is  short;  reactions  certainly  do  take 
time  and  if  intervals  were  sufficiently  short,  simultaneity  might  indeed  disappear, 

I  shall  return  to  a  discussion  of  whether  this  would  indeed  be  so  in  the 
section  on  metatheory.   In  the  present  context,  the  important  thing  to  notice  is 
that  the  disappearance  of  simultaneity  consequent  on  lagging  endogenous  variables 
is  often  merely  apparent.   The  same  argument  which  says  that  it  is  often  hard  to 
tell  the  difference  between  the  effects  of  a  current  endogenous  variable  and 
those  of  the  same  variable  with  a  short  lag  also  suggests  that  it  is  difficult 


16 
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to  distinguish  between  the  current  disturbance  and  the  same  disturbance  with  a 
small  lag.   Particularly  if  a  triangular  coefficient  matrix  has  been  obtained  by 
going  to  a  short  time  period,  autocorrelation  in  the  disturbances  is  likely  to  result. 
In  this  case,  it  is  a  mistake  to  believe  that  a  lagged  endogenous  variable  can 
safely  be  placed  in  the  list  of  predetermined  variables;  such  a  lagged  endogenous 
variable  will  not  be  uncorrelated  with  the  current  disturbance  in  the  equation  to 
be  estimated,  and  the  pretense  that  it  is  —  in  the  extreme  case,  the  use  of  OLS  — 
will  lead  to  inconsistency. 

The  problem  is  closely  related  to  that  of  the  correlation  between  disturbances 
in  recursive  systems  because  in  both  cases,  the  simultaneous  equations  problem  occurs 
not  by  reason  of  a  feedback  effect  in  the  model,  but  because  of  one  implicit  in  the 
disturbance  structure.   In  the  present  case,  the  equation  which  explains  a  lagged 
endogenous  variable  (a  lagged  equation  of  the  model)  has  a  disturbance  which  is  cor- 
related with  the  current  disturbances  of  the  model.  This  prevents  the  use  of  the 
lagged  endogenous  variable  as  predetermined,  just  as  the  correlation  of  the  disturb- 
ance from  the  equation  explaining  a  particular  endogenous  variable  in  a  recursive 
model  with  the  other  disturbances  in  the  model  prevents  the  use  of  that  endogenous 
variable  as  predetermined  in  the  estimation  of  the  other  equations. 

There  is  a  further  similarity  as  well.  Just  as  the  testing  of  the  zero  corre- 
lation assumption  in  a  recursive  model  is  impossible  using  the  OLS  residuals,  so 
too  the  testing  of  the  no  autocorrelation  assumption  in  the  present  case  is  impos- 
sible using  the  OLS  residuals.   A  test  which  is  consistent  only  if  the  null  hypothe- 
sis is  true  cannot  be  used  to  test  that  hypothesis  and  the  Durbin-Watson  statistic 

17 
is  not  a  test  for  autocorrelation  in  an  equation  with  lagged  endogenous  variables. 
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But,  of  course,  the  use  of  lagged  endogenous  variables  as  predetermined  is 
not  restricted  to  cases  in  which  so  doing  leads  to  ordinary  least  squares.   In  more 
general  models  as  well,  such  a  treatment  leads  to  inconsistent  estimates  if  dis- 
turbances are  in  fact  autocorrelated.   In  such  a  situation,  there  are  two  ways  to 
proceed. 

The  first  of  these  is  to  make  a  specific  assumption  about  the  form  of  the  pro- 
cess generating  the  disturbances,  to  assume  autocorrelation  is  of  first  order,  or 
that  the  process  is  a  moving  average  one,  or  the  like.   Having  parametrized  that 
process,  the  parameters  involved  can  (in  principle)  be  estimated,  together  with  the 
rest  of  the  system  by  various  techniques.   (The  limiting  case  of  this  is  to  assume 
zero  autocorrelation.)   Even  for  single -equation  models,  this  is  a  minor  nuisance; 
simultaneous  equations  models  raise  bigger  problems. 

The  difficulty  is  that  one  never  knows  in_  fact  what  the  process  generating  the 
disturbance  is  like.   In  a  way,  autocorrelation  in  the  disturbances  is  a  symptom 
that  something  systematic  and  not  understood  has  been  excluded  from  the  model.   If 
one  tries  to  be  very  specific  about  that  process,  one  runs  an  obvious  risk  of  spe- 
cification error.   This  is  particularly  so  if  one  tries  to  keep  down  the  number  of 
new  parameters.   In  a  simultaneous  system,  for  example,  does  it  really  make  sense 
to  assume  that  the  disturbance  from  a  given  equation  is  correlated  only  with  its 

own  past  values  and  not  with  the  past  values  of  the  disturbances  from  other  equations? 

1  8 
Perhaps  so,  if  the  equations  relate  to  very  different  things,   but  certainly  not 

as  a  general  rule.   And  with  a  reasonable  number  of  parameters  describing  the  pro- 
cess, problems  of  computation  an  d  degrees  of  freedom  can  become  quite  severe.   We 

19 
are  learning  more  and  more  about  how  to  handle  such  problems  in  specific  cases, 
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but  it  is  still  the  case  that  knowledge  that  a  particular  case  applies  is  hard 
to  come  by  a   priori . 

It  may  be  possible  to  acquire  such  knowledge  from  the  data,  however,  and  this 
brings  us  to  the  second  possible  route  to  follow.   If  one  makes  no  assumption 
about  the  autocorrelation  of  the  disturbances  and  treats  lagged  endogenous  varia- 
bles as  endogenous,  assuming  the  worst,  then  a  consistent  estimate  of  the  equations 
can  still  be  obtained  using  an  instrumental  variables  estimator  with  current  and 

lagged  exogenous  variables  as  instruments,  provided  the  system  is  sufficiently 

20 
rich  in  such  variables.    (I  shall  discuss  instrumental  variables  more  generally 

below.)   The  problem  with  such  an  estimator  is  what  one  would  expect,  however;  it 

is  robust  against  all  forms  of  autocorrelation  in  the  disturbances,  but  it  is  not 

very  efficient  if  one  knows  that  a  specific  form  or  class  of  forms  obtains.   On 

the  other  hand,  it  does  give  estimates  of  the  disturbances  (the  residuals)  which 

can  be  consistently  used  to  investigate  the  autocorrelation  structure,  and  it  does 

give  correct  asymptotic  standard  errors  so  that  one  can  make  some  judgment  as  to 

how  important  the  inefficiency  is  likely  to  be. 

So  I  would  recommend  the  following  as  the  best  practice,  recognizing  that 

there  are  more  formal  ways  of  accomplishing  approximately  the  same  things  by  way 

of  the  likelihood  function.   If  you  are  not  willing  to  assume  no  autocorrelation, 

then  construct  an  instrumental  variables  estimator,  using  only  exogenous  and 

lagged  exogenous  variables  as  instruments  (if  possible)  .  Look  at  the  parameter 


20 

If  it   is  not,   one  may  have    to  use  endogenous  variables  with   a  relatively 

long  lag  as  predetermined   and  hope   that  no  serious  problem  is   created.      One   can 

also   take   advantage  of  any  block-recursive  structure   the  system  may  have.      See 

below  and  Fisher   [11]. 
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estimates  and  see  if  they  are  reasonable;  look  at  the  associated  asymptotic  stand- 
ard errors  and  see  if  they  are  satisfactory.   If  they  are,  then  stop.  If  they  are 
not,  take  the  residuals  from  the  estimated  equations,  and  see  if  you  can  discover 
a  relatively  simple  autocorrelation  structure  in  them  as  an  approximation.  If  you 
can,  then  reestimate  the  system,  either  using  a  maximum  likelihood  technique  for 
that  structure  or,  more  simply,  by  writing  the  entire  system  as  one  large  equation 
as  in  three-stage  least  squares  and  estimating  it  as  in  three-stage  least  squares, 
allowing  for  the  fact  that  the  covariance  matrix  of  present  and  past  disturbances 
has  been  estimated  and  is  not  block  diagonal  (as  in  the  usual  case  of  3SLS)  .  Again, 
if  the  system  is  not  too  large  and  therefore  probably  rich  in  exogenous  variables, 
this  last  and  complicated  stage  may  not  be  necessary. 

3.3  Instrumental  Variables 

The  preceding  discussion  naturally  brings  us  to  the  general  topic  of  instru- 
mental variables  estimators.   For  purposes  of  simultaneous  equations  estimation 
(and  for  other  purposes,  too),  it  is  convenient  to  think  of  these  estimators  as 
2SLS  estimators  in  which  not  all  of  the  predetermined  variables  need  be  used.   In 
other  words,  a  list  of  instruments  is  selected  according  to  certain  rules  to  be 
discussed;  the  variables  in  Y  in  (2.1)  are  regressed  on  those  instruments;   the 
systematic  part  of  those  regressions,  denoted  Y  ,  replaces  Y,  in  (2.1);  finally, 
q  is  regressed  on  Y   and  Z.. 

Obviously,  the  rules  by  which  the  list  of  instruments  is  selected  are  quite 
important.   They  fall  into  two  categories:  rules  which  must  be  observed  if  a  con- 
sistent estimator  Is  to  result,  and  rules  designed  to  improve  efficiency  while 
safeguarding  consistency.  The  latter  rules  are  largely  rules  of  thumb. 

I  consider  first  the  rules  required  if  the  estimator  is  to  be  consistent. 
These  can  best  be  approached  by  thinking  about  the  consistency  of  2SLS. 
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A  common  error  is  to  suppose  that  2SLS  works  because  it  begins  with  a  con- 
sistent estimate  of  (some  of)  the  reduced  form  equations.   It  does  so  begin,  but 
this  really  has  nothing  whatsoever  to  do  with  its  consistency.  2SLS  is  consistent 
because  it  replaces  Y  with  a  Y.  with  certain  properties.   These  are  the  minimum 
properties  which  any  instrumental  variables  estimator  must  have. 

First,  the  constructed  Y  is  a  linear  combination  of  predetermined  variables. 
This  is  required  so  that  it  will  be  asymptotically  uncorrelated  with  the  disturb- 
ance ,  u.  . 

Second,  there  must  be  enough  predetermined  variables  used  in  the  first-stage 
regressions  so  that  the  columns  of  Y  and  Z  are  linearly  independent.  This  is 
required  so  that  the  matrix  to  be  inverted  in  the  second  stage  will  be  nonsingular. 
Note  that  this  requires  there  to  be  at  least  m  excluded  predetermined  variables; 
the  total  number  of  predetermined  variables,  A,  must  then  be  at  least  I  +   m; 
this  is  the  Order  Condition  for  the  identifiability  of  (2.1).   In  instrumental 
variables,  in  effect,  excluded  predetermined  variables  not  used  as  instruments  do 
not  aid  in  identification  and  should  not  be  counted  in  A  when  the  Order  Condition 
is  checked. 

Third,  all  the  elements  of  Z  ,  the  included  predetermined  variables,  must 
appear  in  the  list  of  instruments  used.  This  is  a  point  occasionally  overlooked. 
It  is  required  for  the  following  reason.   Write: 

(3.1)  Yj    =  Yj   +  VL 

where  V      is   the      T   x  m     matrix   of   residuals    from  the    first-stage   regressions. 
The  substitution    of  Y      for  Y      amounts    to   substitution   of    (3.1)    into    (2.1), 
obtaining: 

(3.2)  q  =  YjS  +  ZlY  +  Uj   +  V^, 
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so  that  V.g  appears  in  the  disturbance  term  in  the  second  stage.   Consistency  thus 
requires  not  merely  that  Y  and  Z  be  asymptotically  uncorrelated  with  Uj ,  but 
also  that  they  be  asymptotically  uncorrelated  with  VJ,   This  is  guaranteed  for  Y.  ; 
since  Y  and  V,  come  from  least  squares  regressions,  they  are  orthogonal  even  in 
the  sample.   It  is  also  guaranteed  for  Z   for  the  same  reasons,  provided  that  the 
elements  of  Z  were  among  the  regressors  in  those  regressions.  Otherwise,  it  is 
not  only  not  guaranteed,  it  is  very  likely  to  fail,  since  the  reduced  form  shows 
Yj  depending  in  part  on  Z, . 

Finally,  for  essentially  the  same  reason,  the  same  list  of  instruments  must 
be  used  in  all  the  first-stage  regression  which  will  be  employed  in  estimating 
(2.1).  Without  this,  there  is  no  guarantee  that  all  the  elements  of  Y  will  be 
orthogonal  to  all  the  elements  of  V  .  Note  that  different  lists  of  instruments  can 
be  used  to  estimate  different  equations.   Note  also  that  the  property  in  question 
can  be  achieved  by  forming  Y  using  different  lists  of  instruments  for  different 
elements  of  Y  ,  and  then  orthogonalizing  by  using  the  constructed  Y  together  with 
Z  as  instruments,  that  is,  by  constructing  a  Y^  by  regressing  Yj  on  Yj  and  Z,, 
and  using  Y  in  the  final  stage.  As  will  appear  below,  this  is  occasionally  a  sen- 
sible thing  to  do. 

These  four  rules,  as  stated,  are  the  minimal  ones  which  must  be  observed  if 
a  consistent  estimator  is  to  be  obtained.  Further  insight  into  the  problem  is  best 
obtained  by  considering  the  reasons  why  one  might  want  to  employ  an  instrumental 
variables  estimator  in  the  first  place. 

One  of  those  reasons  has  already  been  mentioned.   If  there  are  autocorrelated 
disturbances,  then  lagged  endogenous  variables  ought  not  to  be  treated  as  predeter- 
mined. The  other  principal  reason  is  a  rather  different  one. 

If  the  model  is  at  all  large  (especially  if  lagged  endogenous  variables  are 
treated  as  predetermined),  then  A,  the  number  of  predetermined  variables  may  be 
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larjre  relative  to  T,  the  sample  size.    If  A  exceeds  T,  then  unrestricted  least 
squares  estimates  of  the  reduced  form  cannot  be  computed,  despite  the  fact  that  T 
may  be  considerably  greater  than  I   +  m,  the  number  of  parameters  in  an  equation  to 

be  estimated.   (This  corresponds  to  the  fact  that  A  >  I  +   m  implies  restrictions 

2  1 
on  the  reduced  form,  saving  degrees  of  freedom  if  they  can  be  used.   )    In  this 

situation,  2SLS  cannot  be  used  in  its  original  form,  although  instrumental  variables 

estimators  remain  available. 

A  rather  similar  situation  (of  which  the  above  is  the  limit)  arises  even  for 
T  >  A,  if  the  two  magnitudes  are  close.  If  the  first  stage  regressions  yield  too 
close  a  fit  for  Y,,  then  the  replacement  of  Yj  by  Yj  will  be  no  replacement  at 
all,  and  2SLS  will  coincide  with  OLS .  While  it  can  be  argued  that  the  asymptotic 
nature  of  consistency  makes  this  irrelevant,  it  still  seems  self-defeating  to  use 
so  many  instruments  as  to  secure  an  estimator  which  is  not  properly  an  instrumental 
variables  estimator  at  all. 

Hence,  while  it  can  easily  be  shown  that  the  asymptotically  most  efficient 
estimator  of  the  type  described  is  the  one  which  uses  all  the  predetermined  vari- 
ables, namely  2SLS,  in  many  practical  situations  some  choice  has  to  be  made. 

There  are  essentially  two  suggestions  in  the  field  as  to  how  to  make  that 
choice,  given  a  list  of  variables  eligible  to  be  instruments.   There  is  no  similar 
disagreement  on  how  such  a  list  should  be  constructed.   I  turn  first  to  the  latter 
subject,  to  consideration  of  what  makes  a  good  instrumental  variable. 

The  first  requirement  of  a  good  instrumental  variable  is  that  it  be  predeter- 
mined; more  precisely,  that  it  be  asymptotically  uncorrelated  with  the  disturbance 
in  the  equation  to  be  estimated.    (It  is  this  that  keeps  lagged  endogenous  var- 
iables from  being  predetermined  if  there  is  autocorrelation  in  the  disturbances.) 
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A  variable  will  be  uncorrelated  with  the  disturbance  in  the  equation  to  be  esti- 
mated, as  the  discussion  of  recursive  systems  makes  clear,  provided  two  conditions 
are  satisfied.   There  must  be  no  simultaneous  feedback  loops  connecting  the  equa- 
tion to  be  estimated  and  the  equation  or  equations  explaining  the  potential 
instrument  (which  equation  may  or  may  not  be  explicit  in  the  model);  further,  the 
disturbance  from  the  equation  to  be  estimated  must  not  be  correlated  with  that  of 
the  explanatory  variables.  Predetermined  variables  are  not  generally  predetermined 
because  they  are  truly  non-stochastic;  they  are  so  because  they  satisfy  these  two 
conditions . 

Aside  from  the  obvious  cases  of  truly  exogenous  and  lagged  endogenous  varia- 
bles, these  conditions  can  sometimes  be  satisfied  by  some  of  the  endogenous  vari- 
ables themselves.  This  occurs  if  the  equation  system  is  truly  recursive  (including 
the  condition  on  the  disturbances)  or,  more  generally,  if  it  is  what  I  have  else- 
where called  "block  recursive."  In  the  latter  case,  the  coefficient  matrix  multi- 
plying the  current  endogenous  variables  is  block  triangular,  and  the  covariance 
matrix  of  the  current  disturbances  is  block  diagonal.   Considered  in  groups  of  var- 
iables, called  "sectors,"  there  are  then  no  simultaneous  feedback  loops  crossing 
sectoral  lines  and  no  correlation  between  disturbances  in  different  sectors,  a 
condition  somewhat  more  plausible  than  the  comparable  condition  for  recursive  sys- 
tems.  In  this  case,  the  sectors  can  be  numbered  into  a  hierarchy  with  endogenous 
variables  from  low-numbered  sectors  predetermined  with  respect  to  equations  in 
higher -numbered  sectors.   In  a  real  sense  (this  is  metatheory) ,  the  existence  of 
any_  predetermined  variables  other  than  non-stochastic  ones  rests  on  the  assumption 
that  the  equation  system  to  be  estimated  is  embedded  in  a  block  recursive  structure, 
but  that  system  itself  may  also  be  block  recursive,  in  which  case  endogenous 
variables  from  low-numbered  sectors  become  available  for  use  as  instruments  in  the 
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estimation  of  equations  in  high-numbered  sectors. 

The  other  requirement  of  a  good  instrumental  variable  is  that  it  push  the  en- 
dogenous variables  in  the  equation  to  be  estimated.   In  the  limit,  it  does  not  good 
at  all  to  regress  Yi  on  variables  unrelated  to  the  model,  so  that  Yj  turns  out  to 
be  asymptotically  zero.    The  model  itself,  however,  provides  what  is  supposed  to 
be  a  complete  theory  of  the  determination  of  the  endogenous  variables  and  therefore 
a  complete  list  of  instruments  eligible  from  this  point  of  view.   Such  instruments 
are  the  current  and  lagged  exogenous  variables  and  the  lagged  endogenous  variables 
(if  there  is  no  autocorrelation) .  Any  other  variable  can  affect  the  endogenous  var- 
iables only  by  affecting  one  of  these;  otherwise,  it  should  be  in  the  model.   Such 
another  variable  thus  makes  sense  as  an  instrument  only  when  data  on  the  instru- 
ments in  the  model  are  lacking.  Similarly,  the  use  of  a  lagged  exogenous  variable 
which  does  not  appear  in  the  current  version  of  the  model  only  makes  sense  if  some 
lagged  endogenous  variable  which  it  affects  is  not  used  for,  say,  reasons  of  auto- 
correlation. 

Having  decided  on  a  list  of  eligible  instrumental  variables,  the  problem  now 
arises  of  how  to  choose  among  them.   They  can  be  used  either  by  dropping  some  from 
the  list,  or,  more  generally,  by  selecting  certain  linear  combinations  of  them  for 
use.   As  already  remarked,  there  are  two  suggestions  in  the  field. 

The  first  of  these  is  due  to  Kloek  and  Mennes  [21],  and  essentially  involves 
using  the  first  m  principal  components  of  the  instrumental  variables,  together 
with  the  I   included  predetermined  variables  (other  variants  are  also  given) .   This 
has  two  obvious  advantages.   It  reduces  multicollinearity ,  since  principal  compon- 
ents are  mutually  orthogonal;  and,  in  some  sense,  the  use  of  principal  components 
summarizes  the  information  in  the  list  of  instrumental  variables. 
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See  Fisher  [10],  [11],  and  [12],  for  fuller  discussions, 
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The  trouble  is,  however,  that  the  sense  in  which  that  information  is  summar- 
ized is  not  obviously  the  right  one.  We  want  our  instrumental  variables  to  affect 
the  endogenous  variables  in  the  equation  to  be  estimated.  We  would  like  to  choose 
from  our  list  of  instruments  linear  combinations  which  preserve  the  ways  in  which 
those  effects  operate.   Taking  principal  components,  however,  does  not  do  this; 
rather  it  chooses  linear  combinations  of  the  instruments  with  no  regard  whatsoever 
for  the  ways  in  which  those  instruments  affect  the  variables  of  interest.  The 
same  principal  components  will  be  chosen  for  use  in  the  estimation  of  each  equa- 
tion despite  the  fact  that  different  endogenous  variables  appear  in  the  different 
equations.   Choosing  principal  components  is  an  efficient  way  of  summarizing  the 
interrelations  among  the  instruments,  but  that  in  itself  is  not  what  one  wants  to 
summarize. 

The  situation  is  not  made  any  different  by  a  theorem  of  Amemiya  [1],  which 
shows  that  principal  components  is  asymptotically  the  most  efficient  way  of  choos- 
ing instruments,  provided  one  knows  nothing  about  the  reduced  form.   One  always 
knows  or  can  find  out  something  about  the  reduced  form,  and  that  knowledge  ought 

to  be  used. 

23 
I  have  elsewhere   suggested  one  particular  way  of  utilizing  structural  in- 
formation to  select  instruments.   (There  are,  of  course,  many  other  ways.)   The 
method  I  suggested,  called  SOIV  (Structurally  ordered  instrumental  variables), 
essentially  proceeds  in  three  steps.  The  first  of  these  establishes  a  preference 
ordering  of  the  instruments  relative  to  a  particular  right-hand  side  endogenous 
variable  (an  element  of  Y  ).  This  is  done  in  terms  of  closeness  to  that  variable 
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Fisher  [11].   See  also  Mitchell  and  Fisher  [25]. 
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in  the  structural  equations  in  a  precise  sense  whose  justification  is  largely 
heuristic.   Given  that  preference  ordering,  the  endogenous  variable  in  question  is 
regressed  on  the  instruments  in  differing  combinations  to  determine  whether  an  in- 
strument far  down  in  the  ordering  has  an  independent  effect  on  the  endogenous  vari- 
able in  the  presence  of  more  preferred  instruments  (here  is  where  multicollinearity 
gets  eliminated)  or  whether  it  is  just  using  up  a  degree  of  freedom.   The  end 
result  of  this  stage  is  a  final  list  of  instruments  relative  to  a  particular  right 
-hand  side  endogenous  variable.   That  list  is  used  to  construct  elementsof  Y  . 
Since  the  lists  will  not  be  the  same  for  different  right-hand  side  variables,  and 
need  not  include  all  elements  of  Z.,  one  of  the  rules  given  above  is  broken,  so 

the  constructed  elements  of  Y.   together  with  the  elements  of  Z,  are  then  used  as 

1    °  1 

instrumental  variables  in  the  construction  of  a  Y   ,  as  described  above. 

It  is  still  too  early  (and  too  hard)  to  decide  whether  the  SOIV  method  is  a 
useful  one.   Some  guidance  along  these  lines  is  provided  by  B.  Mitchell  [24],  who 
estimated  the  various  equations  of  the  Brookings  Model  by  SOIV  and  also  using  dif- 
ferent numbers  of  principal  components.   Naturally,  if  one  uses  too  few  principal 
components  (or  too  few  instruments  generally),  one  expects  to  obtain  fairly  ineffi- 
cient estimators  as  relatively  little  information  is  being  brought  to  bear;  the 
limit  of  this  is  loss  of  identification.  On  the  other  hand,  if  one  uses  too  many 
instruments,  the  first  stage  of  2SLS  will  give  too  good  a  fit  and  consistency  will 
suffer;  the  limit  here  is  equivalent  to  ordinary  least  squares.   Mitchell's  results 
suggest  (but  only  suggest)  that  five  principal  components  are  too  few  for  the 
Brookings  Model.   It  is  interesting  to  note  that  the  SOIV  estimates  resemble  the 
ten  or  fifteen  principal  component  estimates,  both  in  terms  of  point  estimates  and 
in  measures  of  goodness -of-f it .  One  might  speculate  that  SOIV  (which  produces  a 
different  set  of  Instruments  for  each  equation  in  the  system)  is  equivalent  to 
finding  the  optimal  number  of  principal  components  in  terms  of,  say,  asymptotic 
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root  mean  square  error  (consistency  and  efficiency  both  being  counted) ,  but  this 
is  merely  a  speculation  in  the  absence  of  more  experience  both  in  practice  and 
with  Monte  Carlo  experiments. 

3.4   Nonlinearities 

Nearly  every  simultaneous  equation  system  of  any  substantial  size  is  nonlin- 
ear in  at  least  the  variables.  Even  though  every  structural  equation  is  specified 
to  be  linear,  there  will  generally  be  nonlinear  identities  connecting  the  vari- 
ables in  the  same  or  different  equations.  The  simplest  example  of  this  is  the 
appearance  of  price,  quantity,  and  revenue  in  the  same  system.   To  the  extent  that 
such  nonlinearities  involve  only  predetermined  variables,  no  particular  problem 
arises;  one  simply  treats  nonlinear  functions  of  predetermined  variables  as  new 
predetermined  variables.  Nonlinearities  in  the  endogenous  variables,  however,  are 
another  matter. 

One  way  of  describing  the  problem  which  arises  in  such  cases  is  to  observe 
that  even  if  the  structural  equations  are  linear  in  parameters  and  disturbances 
(which  I  shall  assume  until  further  notice),  the  reduced  form  will  not  generally 
be  linear  in  anything.   Indeed,  the  solution  for  the  endogenous  variables  in  terms 
of  the  predetermined  variables  and  the  disturbances  may  not  even  exist  in  closed 
form.  This  is  a  computational  nuisance  for  purposes  of  forecasting;  it  is  appar- 
ently a  disaster  for  estimation  methods  which  begin  by  estimating  the  reduced  form. 

The  disaster,  fortunately,  is  only  apparent.   This  is  evident  in  the  case  of 
the  maximum  likelihood  methods  (full  information  or  limited  information)  which 
remain  technically  available  although  computationally  cumbersome.   It  is  also  true 
for  2SLS,  broadly  considered  as  an  instrumental  variable  estimator. 
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We  saw  above  that  it  is  a  mistake  to  think  of  2SLS  primarily  as  an  estimator 
which  begins  with  the  OLS  estimate  of  the  reduced  form.  Consistent  estimates  of 
the  reduced  form  equation  are  indeed  obtained  in  the  first  stage  of  2SLS  in  a  fully 
linear  system,  but  this  is  merely  incidental.  As  already  observed,  2SLS  works  be- 
cause it  is  an  instrumental  variables  estimator  obeying  certain  rules.  Consistent 
reduced  form  estimates  are  not  required.  Just  so,  in  the  present,  nonlinear  case, 
instrumental  variables  will  still  work. 

There  are  two  issues  about  what  instruments  should  be  used,  however,  that  do 
not  arise  in  the  fully  linear  case.   The  first  of  these  is  easy  to  dispose  of. 
Suppose,  for  example,  that  a  particular  endogenous  variable,  say  y,  appears  in  the 
system  both  as  y  and  as  log  y.   When  it  appears  on  the  right-hand  side  of  an  equa- 
tion  in  the  log  form,  should  one  replace  it  by  log  y  or  by  log  y  ?  That  is,  should 
one  obtain  y  by  regressing  y  on  the  instrumental  variables  and  then  take  its  log, 
or  should  one  regress  log  y  directly  on  the  instruments?   Consideration  of  the  rules 
for  instrumental  variables  given  in  the  preceding  section  shows  immediately  that 
it  is  the  latter  alternative  which  is  correct.   If  one  feels  that  y  contains  useful 
information,  so  that  log  y  is  a  natural  thing  to  use  (this  is  unlikely  in  this  case 
but  not  in  cases  where  the  nonlinear  form  is  a  function  of  more  than  one  endogenous 
variable),  then  log  y  can  itself  be  used  as  an  instrument  and  the  information  it 
embodies  employed  in  this  way.  The  latter  procedure  is  similar  to  the  construction 
of  Y   for  orthogonalization  purposes,  as  described  earlier  in  the  case  in  which 

different  instruments  were  used  to  obtain  different  elements  of  Y  . 

1 

The  second  issue  is  more  subtle.   Since  the  reduced  form  is  not  linear  in 
the  predetermined  variables,  there  is  no  reason  to  restrict  oneself  to  the  use  of 
those  variables  in  their  original  forms  as  instruments.   In  the  fully  linear  case, 
one  was  so  restricted,  because  nonlinear  functions  of  the  predetermined  variables 
could  affect  the  endogenous  variables  only  through  their  correlation  with  the  basic 
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p re determined  variables.   In  the  present  case,  that  is  not  so.   As  can  be  seen  by 

approximating  the  reduced  form  by  Taylor  series,  nonlinear  functions  of  the  prede- 

24 
tertnined  variables  do  have  an  independent  effect.    The  problem  is  to  decide  what 

nonlinear  functions  to  use . 

Unfortunately,  little  is  known  about  this  issue.   Obviously,  one  wants  to  use 
such  forms  as  will  approximate  the  reduced  form  while  being  fairly  economical  of 
degrees  of  freedom.   In  some  cases,  consideration  of  the  form  of  the  reduced  form 
equations  (if  they  can  be  obtained)  will  naturally  suggest  some  things  to  use.   In 
other  cases,  one  may  have  to  use  a  low-order  polynomial,  possibly  only  the  linear 
terms.   One  must  be  careful  not  to  use  too  many  terms,  however,  or  inconsistency 
will  result  when  the  endogenous  variables  are  approximated  too  closely.  More  spe- 
cific rules  cannot  now  be  stated,  and  the  literature  on  this  point  is  very  sparse, 
the  interesting  study  by  Goldfeld  and  Quandt  [l6]»  being  the  only  one  known  to  me 
which  bears  directly  on  it. 

The  case  in  which  nonlinearities  in  the  structural  equations  involve  parame- 
ters instead  of  merely  variables  is  even  worse.   Here  there  is  often  no  reasonable 
alternative  to  direct  maximization  of  the  likelihood  function,  since  no  parallel 
to  2SLS  may  exist. 

A  broad  class  of  exceptions  to  this  statement  may  occur  when  the  equations 
can  readily  be  rewritten  to  be  linear  in  the  parameters  with  nonlinear  constraints 
connecting  them;  this  can  often  be  done.   (For  example,  if  parameters  a   and  (3  ap- 
pear both  by  themselves  and  multiplied  together,  they  can  be  renamed  as  Xj,  X  ,  and 
X3  with  the  constraint  that  X3  =  XjX2.)   In  such  a  case,  if  the  constraints  only 
connect  the  parameters  of  a  single  structural  equation  and  do  not  cross  equation 
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For   the   relationship  of  this   fact   to  identification  in  such  systems,   see 

Fisher   [12,   Chapter  5]. 
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lines,  one  might  proceed  as  in  2SLS  but  treating  the  second  stage  (after  Y   has 
been  substituted  for  Yj )  as  one  would  an  ordinary  equation  to  be  estimated  by  least 
squares  subject  to  nonlinear  constraints.   If  the  constraints  do  cross  equation 
lines,  one  might  do  the  same  thing  with  the  final  stage  of  3SLS .  Unfortunately,  it 
is  not  evident  that  these  procedures  are  always  computationally  simpler  than  direct 
maximization  of  the  likelihood  function  (with  limited  or  full  information)  although 
they  may  be.   Moreover,  it  is  not  obvious  what  becomes  of  the  crucial  orthogonali- 
zation  properties  which  make  2SLS  (or  3SLS)  work  when  the  final  stage  does  not  min- 
imize an  unconstrained  sum  of  squares.   It  seems  pretty  clear  that  such  methods 
are  consistent.  What  correct  expressions  for  their  asymptotic  covariance  matrices 
are  is  more  difficult  to  say.   This  is  a  fit  tonic  for  further  investigation. 


3.5  Miscellaneous  Topics 

Before  closing  the  discussion  of  applied  theory  I  take  up  a  number  of  topics 
partly  related  to  each  other  and  to  those  already  discussed,  and  partly  independent. 
Each  topic  concerns  something  which  many  or  most  econometricians  probably  do  cor- 
rectly, but  on  which  I  have  observed  occasional  confusion. 

The  first  such  topic  concerns  the  asymptotic  variance-covariance  matrix  of 
2SLS  or  instrumental  variables  estimators.   It  is  tempting  to  suppose  that  2SLS 
really  does  work  just  like  a  double  application  of  least  squares,  so  that  the 
asymptotic  variance  matrix  is  that  which  would  be  given  by  a  regression  program 
were  the  estimate  indeed  obtained  in  two  stages.  This  is  false.  The  difference 
arises  because  of  the  introduction  of  V^   into  the  error  term  when  Y  is  substituted 
for  Y   in  (3.1).  A  double  application  of  least  squares  would  treat  the  error  term 
at  the  second  stage  as  (u  +  V  fi)  and  multiply  the  inverted  cross-product  matrix  of 
the  regressors  (Y  and  Z.)  by  the  estimated  variance  of  that  composite  term.  In 
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fact,  however,  the  orthogonality  of  V  to  Y  and  Z1  in  the  sample  means  that  the 
correct  computation  does  not  involve  Vj  at  all  but  merely  the  variance  of  Uj .  The 
latter  variance  must  be  computed  by  substituting  the  parameter  estimates  in  (3.1) 
and  taking  residuals . 

Closely  related  to  this  are  the  issues  raised  by  the  moderately  common  prsLc- 

2 
tice  of  reporting  R  statistics  for  the  structural  equations.   Such  statistics  are 

frequently  ambiguous  as  to  what  they  refer;  they  are  generally  devoid  of  much 

2 
meaning  in  any  case.   An  R  for  a  structural  equation  can  be  one  of  two  things.  It 

2 
can  be,  as  it  were,  the  R  from  the  second  stage  of  two  stage  least  squares;   in 

this  case  it  involves  V.3.  We  have  just  seen,  however,  that  the  presence  of  V,|S 

in  the  second  stage  error  term  is  irrelevant  for  the  correctly  computed  asymptotic 

standard  errors.   This  makes  it  hard  to  see  why  it  should  be  relevant  for  any  meas- 

2 
ure  of  goodness-of-fit.   On  the  other  hand,  R  can  be  reported  for  the  original 

structural  equation;  that  is,  it  can  be  one  less  the  sum  of  squared  residuals  divi- 
ded by  the  centered  sum  of  squares  of  the  dependent  variable.  This  makes  somewhat 

2 
more  sense,  but  is  still  very  difficult  to  interpret.   Is  a  high  R  good  or  bad? 

2 
We  certainly  do  not  maximize  R  in  simultaneous  equations;  moreover,  the  orthogon- 

2 
ality  properties  of  least  squares  which  make  R  easy  to  interpret  in  terms  of 

fraction  of  variance  explained  are  not  preserved. 

2 
There  is,  of  course,  rather  more  point  in  reporting  R   for  the  reduced  form 

equations,  especially  for  the  unrestricted  least  squares  estimates  of  those  equa- 
tions.  For  the  final  estimates  of  the  reduced  form,  there  are  still  grave  problems 

2 
of  interpretation,  however,  as  onee  again,   R  is  not  being  maximized  and  ortho- 
gonality not  preserved . 


25 

See  Basmann  [  2 ] • 
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Indeed,    the  only  proper  measures  of   goodness -of -fit   for  a  structural  equation 

26 
are  provided  by   the   asymptotic  standard  errors  of  the  coefficients.        Unfortunate- 
ly,   these   are  only  asymptotic  and  relatively   little  is  known  about    the   small  sample 

27 
distribution  of  the  estimates.  We   do  know  that   if  the  degree  of  overidentifica- 

28 
tion   is   low,    true   sample  variances   may  be  infinite.  This  obviously  means   that 

asymptotic  variances   may  be  very  poor  guides   to  true   sample  variances,   but   the   im- 
plications of  that  statement   are  not  so  severe  as  may  appear  at   first   glance.      Even 
with   thick   tails   and  an  infinite  variance,    it  is  possible  to  state  in  what   Interval 
the   central  95  percent  of  a  given   distribution   lies.      It  is  entirely  possible   that 
confidence  intervals   formed  using   the   asymptotic  standard  errors   and  at  or  normal 
distribution   give  a  decent   approximation  to   the   true   confidence   intervals.     This 
is  a  matter  which  has  not  been  much  investigated  in  the  Monte  Carlo  literature,   and 
it  should  be. 

The  remaining  topics  are  of  a  rather  different  kind.        I  have  occasionally 
seen  equations  estimated  by  ordinary   least   squares  when  one  of  the   right-hand   side 
variables  was  obviously  endogenous,    the  excuse  being  that  no  simultaneous   feedback 
appeared   in  the  model   as  written.      It  ought    to  go  without   saying  that   the   failure 
to  write  down   a  full  model  does  not  make  simultaneity   disappear.      What  matters   is 
whether  regressors   are   in   fact   correlated  with  disturbances,   not  whether  one  has 
written   down   the  mechanism  through  which   that   correlation   comes   about. 


26 

There  have  been   some  attempts   to  define  measures  of  goodness-of-fit    for 

the  entire   model.      See   Hooper   [18]. 

27 

This  means,    incidentally,    that  it   is   incorrect   to   report  estimates   as 

"significant"   if,   say,    they  would  be  so  were   they  _t-distributed  as   in  ordinary 

least   squares.      I  would   refer   to  such   cases   as   "quasi-significant." 

28 

See  Basmann    [3] . 
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Nor  is  it  true  that  one  must  write  down  and  estimate  the  complete  system  in 
order  to  estimate  a  particular  structural  equation  (although  this  may  be  necessary 
for  forecasting  purposes).  Once  again,  a  consistent  estimate  of  the  reduced  form 
is  not  a  necessary  starting  point  for  consistent  structural  estimation,  nor  does 
2SLS  require  complete  specification  of  the  system.  What  is  required  is  a  division 
of  the  variables  in  the  equation  to  be  estimated  into  endogenous  and  predetermined, 
together  with  a  list  of  other  predetermined  variables  in  the  system.   That  list 
ought  to  be  as  comprehensive  as  possible,  but  it  does  not  have  to  be  entirely  com- 
plete.  Moreover,  the  exact  specification  of  the  other  structural  equations  is  not 
needed  at  all. 

A  somewhat  related  point  is  the  view  that  it  is  only  possible  to  proceed  in 
terms  of  a  single  equation  or  a  very  small  subsystem  because  if  one  wrote  down  the 
entire  system,  nothing  would  be  predetermined.  I  shall  return  to  this  in  the  dis- 
cussion of  metatheory,  but  the  discussion  of  predetermined  variables  and  of  block- 
recursive  systems  already  given  indicates  that  this  is  in  error. 

Finally,  I  come  to  the  question  of  the  widespread  practice  of  exploring  the 
model  specification  with  ordinary  least  squares,  reserving  simultaneous  equations 
techniques  for  the  final  estimation  of  the  model  when  specification  has  finally 
been  decided  on.   This  is  clearly  an  inconsistent  way  to  proceed.   If  one  is  lucky, 
OLS  and  simultaneous  equations  estimates  will  not  lie  too  far  apart  and  it  won't 
matter.  If  one  is  persistent,  then  the  specification  search  will  proceed  until 
one  comes  upon  a  specification  which  looks  reasonable  under  both  OLS  and  consistent 
estimation.  In  that  case,  the  two  sets  of  estimates  will  not  lie  too  far  apart  and 
it  will  matter  very  much.  One  wonders  to  what  extent  the  fairly  common  observation 
that  OLS  estimates  are  not  too  far  from  consistent  ones  is  due  to  such  practices. 
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4 .   Metatheory 

I  now  come  to  that  range  of  subjects  which  I  have  called  "metatheory,"  the 
discussion  of  the  real  nature  and  origin  of  simultaneous  equations.   I  have  already 
touched  on  these  topics  in  the  preceding  sections.  They  fall  into  three  somewhat 
related  areas:  simultaneous  equation  systems  as  approximations  to  and  part  of 
larger  and  even  more  simultaneous  systems;  simultaneous  systems  as  approximations 
to  non-simultaneous  systems;  and  the  existence  of  normalization  rules. 

4.1  Is  Everything  Simultaneous? 

Anyone  who  has  ever  built  an  econometric  model  knows  it  is  only  an  approxima- 
tion. Variables  are  left  out  which  are  believed  to  have  negligible  but  not  zero 
effects  —  these  may  be  variables  left  out  of  a  particular  equation  or  left  out  of 
the  entire  model;  variables  taken  as  predetermined  are  generally  not  non-stochastic, 
but  the  equations  explaining  them  are  omitted.  Yet  such  assumptions  are  crucial  in 
simultaneous  equations  estimation.  The  question  of  what  variables  appear  in  what 
equations  with  non-zero  coefficients  is  basic  to  the  identification  of  the  equa- 
tions; the  taking  of  some  variables  as  predetermined  plays  a  similar  role.  It  can 

29 
be  argued   that  if  the  true  full  system  of  equation  were  written  down,  few  if  any, 

variables  would  be  predetermined  and  no  equation  would  be  identifiable.  Under 

such  a  view,  structural  estimation  appears  impossible. 

It  is  well  to  be  clear  as  to  the  powerful  nature  of  such  an  argument.   It  is 

not  merely  argued  that  the  making  of  bad  approximations  leads  to  bad  results ;  the 

implication  is  that  the  making  of  any  approximations  may  so  lead.   In  effect,  the 

argument  suggests  that  structural  estimation  is  so  little  robust  that  there  is  no 


29 

See  Liu  [22] 
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such  thing  as  a  good  approximation.   Fortunately,  this  turns  out  not  to  be  the 
case,  although  this  must  not  be  taken  as  an  excuse  for  making  bad  approximations, 

I  turn  first  to  the  question  of  predetermined  variables,  and  the  fact  that 
simultaneous  systems  are  generally  embedded  in  larger  ones  of  which  they  are  a  part 
and  in  which  their  predetermined  variables  are  endogenous.  We  have  already  seen, 
in  our  discussion  of  instrumental  variables,  that  variables  endogenous  to  a  complete 
system  can  be  taken  as  predetermined  in  the  estimation  of  other  equations,  provided 
the  system  is  block-recursive.  In  the  present  context,  this  means  that  the  treatment 
of  some  variables  as  predetermined  in  a  simultaneous  system  is  permissible  provided 
that,  when  the  complete  system  is  written  down,  it  exhibits  two  characteristics. 
First,  there  must  be  no  simultaneous  feedback  loop  connecting  the  endogenous  vari- 
ables of  the  model  and  the  variables  taken  as  predetermined  (even  a  loop  passing 
through  a  third  set  of  variables);  this  means  the  matrix  of  coefficients  of  the  cur- 
rent endogenous  variables  of  the  complete  system  must  be  block -triangular .   Second, 
the  disturbances  of  the  equations  of  the  model  must  not  be  correlated  with  those  of 
the  equations  generating  the  variables  to  be  taken  as  predetermined.   (This  second 
point  is  easy  to  overlook,  but  is  easier  to  believe  in  the  present  context  than  in 
the  case  of  recursive  systems  discussed  above.   It  is  a  sine  qua  non  of  structural 
estimation.)   If  both  these  properties  are  present,  then  the  original  model  can  be 
analyzed  without  regard  for  the  fact  that  it  is  embedded  in  a  larger  one.   Its 
equations  will  be  identifiable  in  the  context  of  the  larger  model  if  (and  only  if) 
they  are  identifiable  when  the  smaller  model  is  considered  alone.   Estimation  of 

the  smaller  model  can  (and  should)  safely  take  the  relevant  variables  of  the  larger 

30 
model  as  predetermined. 
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For  further  discussion,  see  Fisher  [10]  and  [12,  Chapter  4]. 
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Yet  such  an  analysis  does  not  completely  close  the  question.  There  remains 
the  problem  that  the  prior  restrictions  in  the  model  may  be  only  approximately 
right;  that  variables  left  out  of  the  model  may  be  there  with  small  coefficients; 
that  variables  taken  as  predetermined  may  only  be  approximately  so  —  in  other  words, 
that  the  full  model  may  only  be  approximately  block  recursive. 

Fortunately,  it  turns  out  that  such  approximations  (if  good  ones)  do  not  des- 
troy the  possibility  of  structural  estimation.  The  appearance  of  such  destruction 

comes  from  viewing  identification  in  a  misleading  way  as  a  totally  yes  or  no  propo- 

31 
sition.   It  is  quite  possible  to  define  a  concept  of  "near  identifiability"    in 

which  a  true  equation  can  be  told  apart  from  any  candidate  which  is  not  very  close 

to  it.   Looked  at  another  way,  the  problem  is  that  of  the  behavior  of  the  usual 

simultaneous  equations  estimators  under  very  small  specification  error.   It  turns 

out  that,  while  such  estimators  are  not  consistent  under  such  error,  the  extent  of 

the  inconsistency  goes  to  zero  as  the  specification  error  does,  so  that  very  small 

specification  errors  have  very  small  consequences.  Even  though  (as  already  mentioned) 

the  different  estimators  behave  differently  under  small  specification  error,  they 

32 
all  have  this  crucial  continuity  property  in  common.    This  does  not  mean  that  big 

errors  don't  have  big  consequences  or  that  it  is  easy  to  judge  when  a  given  error 

is  negligible.   It  does  mean  that  the  mere  fact  of  approximation  does  not  destroy 

structural  estimation. 
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See  Fisher  [12,  Chapter  3]. 
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This  was  shown  in  Fisher  [10],   The  theorems  in  question  are  in  part  gen- 
eralizations of  the  Proximity  Theorem  of  Wold  for  OLS.   See  Wold  and  Faxer  [36]. 
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4 .2   Is  Anything  Simultaneous? 

At  the  opposite  extreme  from  the  view  just  discussed,  that  the  true  system  ex- 
hibits more  simultaneity  than  models  assume,  lies  the  view  that  the  true  system  is 

32a 
not  simultaneous  at  all.  This  argument,  largely  associated  with  Wold,     observes 

that  violence  is  done  to  common  notions  of  causation  when  causal  flows  are  circular 
and  simultaneous.  Moreover,  causation  takes  time.   If  structural  equations  are  sup- 
posed to  represent  the  behavior  of  decision-makers  and  their  response  to  the  stimuli 
provided  by  certain  economic  variables,  then  one  must  recognize  that  there  is  at 
least  a  small  lag  between  stimulus  and  response.   If  nothing  else,  it  takes  a  small 
amount  of  time  for  the  brain  to  register  a  stimulus  and  to  respond  to  it. 

On  this  vieWj  the  real  world  is  certainly  not  simultaneous.   If  we  could  only 
observe  variables  at  sufficiently  small  time  intervals,  simultaneity  would  disappear. 

This  is  a  convincing  argument.   The  really  interesting  question,  however,  is 
whether  it  has  any  important  consequences.   Suppose  that  we  consider  simultaneous 
equations  models  as  approximations  to  true  models  with  very  small  time  lags;  will 
those  approximations  be  good  ones  and  will  this  change  the  way  in  which  we  analyze 
such  models? 

One  way  of  seeing  that  it  may  not  make  very  much  difference  is  to  look  back  to 
our  discussion  of  the  use  of  lagged  endogenous  variables  as  predetermined.   As  we 
take  smaller  and  smaller  time  intervals,  it  does  become  more  and  more  plausible  to 
have  only  lagged  instead  of  current  endogenous  variables  on  the  right-hand  sides  of 
our  equations.   It  also  becomes  less  and  less  plausible  that  disturbances  are  not 
autocorrelated.   Since  the  difficulties  raised  by  using  lagged  endogenous  variables 
as  predetermined  when  the  autocorrelation  in  the  disturbance  is  unity  are  precisely 
those  encountered  in  simultaneous  equation  estimation  with  unlagged  endogenous  vari- 
ables, one  suspects  that  the  presence  or  absence  of  extremely  small  time  lags  may 
not  matter  much. 
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See  [35]  and  other  writings. 
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A  more  direct  approach  to  this  question  has  been  given  by  Strotz   [34],  who 
considered  simultaneous  models  as  approximations  to  underlying  non-simultaneous 
ones  in  the  following  sense.   Suppose  that  we  are  able  to  observe  variables  only  at 
discretely  spaced  points  of  time,  spaced  by  what  we  may  term  an  "observation  inter- 
val" (a  month,  a  quarter,  a  year,  or  what  have  you).   Suppose  further  that  the  true 
underlying  model  is  not  simultaneous,  but  has  a  very  small  lag  which  we  may  term 
the  "reaction  interval."  Let  the  observation  interval  be  fixed  and  the  reaction 
interval  tend  to  zero;   does  treating  the  model  as  simultaneous  make  more  and  more 
sense? 

Strotz  analyzed  this  situation  and  found  an  apparent  paradox  in  which  the 
maximum  likelihood  estimator  of  the  true  model  did  not  approach  that  of  the  limit- 
ing simultaneous  model  (FIML)  as  the  reaction  interval  went  to  zero.  That  paradox 
has,  however,  since  been  resolved  by  Dreze  and  Strotz  [7],  who  showed  that  the 
simultaneous  model  can  be  thought  of  as  a  limit  in  more  than  one  way.  (It  is  also 
true  that  the  autocorrelation  question  raised  above  was  not  explicitly  considered 

by  Strotz,  so  that  the  apparent  discontinuity  may  have  been  built  into  the  assump- 

33 
tions  on  the  disturbances.)    In  any  case,  simultaneous  equations  estimation  now 

seems  appropriate  as  the  outcome  of  such  a  limiting  process. 

It  does  not  seem  to  me,  however,  that  the  process  just  described  is  the  most 

appropriate  one  to  consider.   Generally  (not  always),  we  do  not  in  fact  observe 

variables  at  discretely  spaced  points  in  time;  rather  we  observe  sums  or  averages 

of  those  variables  taken  over  observation  intervals.   Gross  national  product,  for 

example,  while  not  observed  at  every  moment  is  also  not  reported  as  of  January  1; 

rather  it  is  reported  as  a  cumulant  over  a  year  or  a  quarter.   Prices  are  sometimes 
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an  exception,  but  even  here  we  generally  observe  or  use  average  prices  rather  than 
momentary  ones.  (Certainly,  we  would  prefer  to  use  average  prices.)  Simultaneous 
models  are  framed  in  terms  of  such  averages  over  the  observation  interval,  while 
the  reaction  interval  is  assumed  to  be  small.  The  limiting  process  of  interest  is 
that  in  which  the  observation  Interval  used  for  averaging  remains  fixed,  while  the 
reaction  interval  tends  to  zero. 

The  question  of  what  happens  to  maximum  likelihood  estimators  in  such  a  situ- 
ation has  yet  to  be  formally  answered,  although  there  seems  little  doubt  that  simul- 

,  34 
taneous  equation  estimators  will  emerge  as  the  appropriate  limit.  My  own  work 

has  concentrated  on  the  question  of  what  must  be  true  of  a  simultaneous  model  in 
order  that  it  be  capable  of  being  thought  of  as  the  limit  achieved  by  neglecting 
small  lags  in  such  a  situation. 

Briefly,  it  is  not  hard  to  see  that,  by  averaging  the  true  non-simultaneous 
model  over  the  observation  interval,  one  obtains  a  model  differing  from  the  simul- 
taneous one  principally  by  end  effects.  The  current  endogenous  variables  on  the 
left-hand  sides  of  the  equations  become  their  own  averages,  but  the  slightly  lagged 
endogenous  variables  on  the  right  become  the  same  averages  lagged  by  one  reaction 
interval.   The  latter  averages  differ  from  the  former  in  only  their  first  and  last 
terms.   Clearly,  what  is  required  is  that  such  terms  become  less  and  less  important 
in  the  averaging  process  as  the  reaction  interval  approaches  zero;   this  implies 
some  stability  conditions  on  the  underlying  model  and  corresponding  conditions  on 
the  simultaneous  approximation.   In  the  fully  linear  case,  those  conditions  turn  out 
to  be  that  the  characteristic  roots  of  the  matrix  of  coefficients  of  the  current 
right-hand  side  endogenous  variables  all  be  no  greater  than  unity  in  modulus  and 
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that  plus  one  not  be  such  a  root.   There  are  less  detailed  but  equally  stringent 
conditions  when  the  model  is  not  fully  linear,  as  will  occur  if  there  are  nonlinear 
identities. 

Moreover,  by  considering  submodels  of  the  original  model  made  up  of  some  but 
not  all  of  the  equations  or,  alternatively,  by  considering  gedanken  experiments 
in  which  some  of  the  equations  are  suppressed  and  the  values  of  the  corresponding 
endogenous  variables  set  by  outside  control,  one  finds  that  similar  conditions  must 
hold  for  all  submodels  which  contain  a  feedback  loop.   The  strength  of  this  condi- 
tion can  be  seen  by  observing  that,  in  the  fully  linear  case,  the  stated  conditions 
on  characteristic  roots  must  hold,  not  only  for  the  matrix  of  coefficients  of  cur- 
rent right-hand  side  endogenous  variables  itself,  but  also  for  all  of  its  principal 
submatrices. 

It  seems  likely  that,  by  testing  whether  a  model  satisfies  such  conditions  and 
which  of  its  feedback  loops  fail  them,  difficulties  of  specification  can  be  found. 
The  tests  are  computationally  non-trivial  in  terms  of  computer  time,  however,  and 
experience  with  this  approach  lies  in  the  future. 

4.3  Are  There  Natural  Normalization  Rules? 


The  final  question  of  metatheory  which  I  shall  discuss  is  that  of  the  exist- 
ence of  natural  normalization  rules  for  the  structural  equations  of  a  simultaneous 
model.   We  have  already  observed  that  this  question  conceivably  bears  on  the  choice 
of  an  estimator.   The  sample  estimates  obtained  by  some  estimators  (2SLS,  3SLS  and 
SOIV,  for  example)  depend  on  the  choice  of  a  normalization  rule  for  the  equation 
being  estimated,  whereas,  for  other  estimators  (LIML  and  FIML)  they  are  independent 
of  such  choice.   Exactly  what  this  means  for  the  small  sample  properties  of  the 
estimators  is  unknown,  but  some  interest  clearly  attaches  to  the  question  of  whether 
natural  normalization  rules  exist.   1  believe  that  they  do. 
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It  is  important  to  distinguish  the  question  of  the  existence  of  natural  nor- 
malization rules  from  the  quite  different  question  of  whether  or  in  what  sense  right 
-hand  side  variables  can  be  said  to  "cause"  left-hand  side  variables.   Debate  over 
the  latter  question  marked  the  early  period  of  simultaneous  equations  literature  and 
is  settled,  generally,  by  the  observation  that  the  causal  structure  is  given  by  the 

reduced  form  where  all  the  predetermined  variables  cause  jointly  all  the  endogenous 

35 
variables.    This,  however,  is  not  the  same  as  the  question  of  whether  natural  nor- 
malization rules  exist  for  individual  equations. 

The  latter  question  can  be  approached,  I  think,  in  two  ways.  The  first  is  by 
considering  the  way  in  which  simultaneous  equations  systems  are  built.   They  are  not 
constructed  as  a  seamless  web,  rather  they  are  put  together  equation-by -equation, 
with  each  equation  (apart  from  the  identities)  representing  the  behavior  of  certain 
decision-makers.   Each  equation  has  a  life  of  its  own.   One  could  imagine  it  in  a 
different  model  or  in  a  model  in  which  all  but  one  of  the  variables  were  set  by  out- 
side control  and  the  remaining  variable  determined  by  the  deci si on -makers  whose 
behavior  is  modeled.   In  this  context,  it  is  usually  clear  what  the  natural  normali- 
zation rules  are.   The  very  naming  of  an  equation  as  the  "consumption"  function,  for 
example,  suggests  that  it  is  naturally  normalized  for  consumption,  not,  say,  income 
and  that  it  is  consumption  decisions  which  it  models. 

An  apparent  class  of  exceptions  to  this  concerns  the  classic  question  of 

36 
whether  supply  and  demand  curves  should  be  normalized  for  price  or  quantity.    I 

regard  this  not  as  an  exception  in  which  natural  normalization  rules  fail  to  exist 

so  much  as  a  case  in  which  we  are  ignorant  of  those  rules .   This  comes  about  because 
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See  Simon  [  33  ]  for  a  discussion  of  causation  in  block  recursive  systems, 

36  See  Schultz  [32]. 
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we  have  no  good  theory  of  disequilibrium  price  adjustment  in  competitive  markets. 
Our  theories  specify  supply  and  demand  curves  and  state  that  price  somehow  adjusts 
when  supply  and  demand  are  not  equal.   Econometric  models  which  run  into  this 
ambiguity  do  so  because  they  assume  (possibly  properly)  that  market  equilibrium  is 
attained  rapidly  enough  so  that  price  (and  quantity)  adjustment  mechanisms  need  not 
be  specified.   In  such  a  case,  it  is  indeed  not  clear  what  normalization  rules  should 
be  because  the  equations  being  estimated  are  the  equilibrium  states  of  the  true 
underlying  structural  equations  and  the  adjustment  mechanism  has  been  suppressed. 

This  becomes  clearer  if  we  consider  the  matter  another  way.  We  saw  above  that 
simultaneous  equation  models  can  plausibly  be  considered  to  be  the  limits  of  non- 
simultaneous  ones  as  time  lags  go  to  zero.   Nobody  doubts  the  existence  of  natural 
normalization  rules  in  such  non-simultaneous  models,  however.   It  seems  natural  to 
take  the  normalization  rules  of  the  limiting  simultaneous  models  as  the  limits  of 
the  rules  for  the  non-simultaneous  ones.   This  is  clearly  in  accord  with  the  notion 
that  we  are  modeling  the  behavior  of  decision -makers  reacting  to  stimuli. 

Thus  it  seems  to  me  that  natural  normalization  rules  do  indeed  exist,  although 

we  may  not  always  be  able  to  specify  them.  Whether  imposing  them  makes  any  serious 

37 
difference  to  estimation  is  basically  not  known,  however. 

This  completes  our  discussion  of  metatheory  and,  indeed,  of  simultaneous  equa- 
tions estimation.   In  closing,  it  may  be  appropriate  to  remark  that  a  decade  ago, 

Econometrica  published  a  symposium  on,  "Simultaneous  Equations  Estimation:  Any 

38 
verdict  Yet?"     I  don  t  really  think  the  jury  is  still  out  on  the  main  question, 

but  there  is  certainly  a  lot  of  evidence  to  consider. 

37 

In  the  tests  described  in  the  preceding  section,  normalization  rules  do 

make  a  difference.   A  model  may  very  well  pass  those  tests  with  one  set  of  normali- 
zation rules  and  not  with  another.  The  considerations  which  lead  to  those  tests, 
however,  show  clearly  that  the  important  question  is  whether  they  are  passed  with 
normalization  rules  the  natural  ones  for  the  non -simultaneous  model  being  approxi- 
mated. 
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[4]. 
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